A Bayesian approach to Nested Clade Analysis

نویسنده

  • Ioanna Manolopoulou
چکیده

The purpose of this study is to identify genetically distinct clusters of individuals based on related characteristic traits (namely phenotypic data) or geographical locations (namely phylogeographic data). There are 2 main steps to this process: inferring the genetic history of the sequences under study, and subsequently identifying significant clusters according to the phenotypic/phylogeographic measurements. Based on an evolutionary model and an appropriate model for the distribution of the phenotype, such inference is possible in a number of different ways. However, due to the multiple level uncertainty and the complexity of the models, it is essential that the methods avoid stepwise optimization in order to give statistically reliable conclusions. The main methods currently used for analysis of this type are called Nested Clade Analysis (NCA) and Nested Clade Phylogeographic Analysis (NCPA) for phenotypic and phylogeographic data respectively. In short, they rely on finding the optimal genetic history based on a simplified evolutionary model, and identifying significantly different clusters for the phenotype/geography (assuming the inferred genetic history as fixed) by using Nested Analysis of Variance and permutation tests. Such methods do not allow for the uncertainty of each step to fully propagate through the model and have been shown by simulations often to lead to false conclusions. Here we describe a coherent statistical framework for NCA/NCPA by taking a (Reversible Jump) Markov chain Monte Carlo approach to the genetic clustering problem. By considering a general evolutionary model and clustering constructions using haplotype trees for the phenotypic and phylogeographic analysis respectively, we construct a holistic method in order to obtain the global optimum of the parameters of interest. Several challenges arise in this process. The presence of homoplasy (representing convergent evolution, usually through back mutations) can obscure the analysis, increasing the number of possible histories that underly the data. This leads to intractable likelihoods and normalisation constants. Here we use Approximate Bayesian Computation to address these issues. In addition, the parameter space of clusterings is vast, so we employ adaptive methods and efficient proposals to ensure mixing and convergence. Lastly, we address inherent issues of similar clustering and phylogenetic inference problems such as label-switching (for the cluster parameters) and representation of trees (essential for convergence assessment). We implement our method for 3 datasets and discuss the results in relation to NCA and NCPA.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Molecular Phylogeny of the Genus Lathyrus (Fabaceae-Fabeae) Based on cpDNA matK Sequence in Iran

Background: More than 60 species of the genus Lathyrus are distributed in Southwest Asia. It is the second largest genus of the tribe Fabeae, after Vicia, in the region (and in Iran with 23 species). In the regional Flora (Flora of Turkey, FloraIranicaand flora...

متن کامل

About incoherent inference

In Templeton (2010), the Approximate Bayesian Computation (ABC) algorithm (see, e.g., Pritchard et al., 1999, Beaumont et al., 2002, Marjoram et al., 2003, Ratmann et al., 2009) is criticised on mathematical and logical grounds: “the [Bayesian] inference is mathematically incorrect and formally illogical”. Since those criticisms turn out to be bearing on Bayesian foundations rather than on the ...

متن کامل

Analysis of mitochondrial DNA sequences of Turcinoemacheilus genus (Nemacheilidae Cypriniformes) in Iran

Members of Nemacheilidae Family, Turcinoemacheilus genus were subjected to molecular phylogenetic analysis in this study. This genus was reported in 2009 to inhabit in Karoon River drainage, in contrary to previous assumption that it was the endemic species in the Basin of Tigris River. It was sampled from three stations placed in different tributaries in Karoon drainage and evaluated to unders...

متن کامل

Bayesian Optimum Design Criterion for Multi Models Discrimination

The problem of obtaining the optimum design, which is able to discriminate between several rival models has been considered in this paper. We give an optimality-criterion, using a Bayesian approach. This is an extension of the Bayesian KL-optimality to more than two models. A modification is made to deal with nested models. The proposed Bayesian optimality criterion is a weighted average, where...

متن کامل

Topology-Bayes versus Clade-Bayes in phylogenetic analysis.

Several features of currently used Bayesian methods in phylogenetic analysis are discussed. The distinction between Clade-Bayes and Topology-Bayes is presented and illustrated with an empirical example. Three problems with Bayesian phylogenetic methods--exaggerated clade support, inconsistently biased priors, and the impossibility of hypothesis testing of cladograms--are shown to be the result ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008